A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge

نویسندگان

  • DL Rubin
  • CF Thorn
  • TE Klein
  • RB Altman
  • Patrick Herron
  • Can Rubin
چکیده

Design: The authors built and evaluated several candidate statistical models that characterize pharmacogenetics articles in terms of word usage and the profile of Medical Subject Headings (MeSH) used in those articles. The bestperforming model was used to scan the entire Medline article database (11 million articles) to identify candidate pharmacogenetics articles. Results: A sampling of the articles identified from scanning Medline was reviewed by a pharmacologist to assess the precision of the method. The authors’ approach identified 4,892 pharmacogenetics articles in the literature with 92% precision. Their automated method took a fraction of the time to acquire these articles compared with the time expected to be taken to accumulate them manually. The authors have built a Web resource (http://pharmdemo.stanford. edu/pharmdb/main.spy) to provide access to their results. Conclusion: A statistical classification approach can screen the primary literature to pharmacogenetics articles with high precision. Such methods may assist curators in acquiring pertinent literature in building biomedical databases. j J Am Med Inform Assoc. 2005;12:121–129. DOI 10.1197/jamia.M1640. A challenge for biomedical researchers in the postgenomic era is to use prior knowledge about pharmacogenetics to understand the genetic basis for drug response and to predict drug response in individual patients. Pharmacogenetics research studies investigate how variations in particular genes alter the efficacy and toxicity of drugs. These studies contribute knowledge about which genes are involved in producing or altering a drug effect (mechanism of action or metabolism) and how genetic variations alter observable responses. There are more than 3million citations inMedline mentioning the term gene or drug, and a resource that summarizes the gene–drug relationships that have been established and the level of support for them in the literature would help researchers identify gaps in the current knowledge and plan new research studies. Other text sources such as drug patent literature or drug information databases could also provide useful data for this resource. Many databases collect genetic and protein sequences, structures, expression, and phenotype data, but few contain information that would permit scientists to explore possible connections between genes and drugs. Pharmacogenetics data are just beginning to become available online, and most are in unstructured free-text format. Those few resources that contain pharmacogenetically relevant data focus on a subset of pharmacogenetics studies. A comprehensive resource of information that relates all genes and drugs does not yet exist. We have been building PharmGKB, a resource for pharmacogenetics. One of the services PharmGKB provides is a curated repository of gene–drug relationships, annotated with literature support, contributed by curators and the scientific community. While growth of our repository has been increasing progressively, the ability to create a comprehensive resource is limited by the enormity of the literature and the time it takes curators to identify relevant articles. The medical literature has been the primary venue for distributing the results of pharmacogenetics studies. One possibility for building a pharmacogenetics resource is to populate it directly from the literature by screening all published articles for pertinent pharmacogenetics content. Some biomedical Affiliations of the authors: Section of Medical Informatics, Stanford University, Stanford, CA (DLR, TEK, RBA); Department of Genetics, Stanford Medical Center, Stanford, CA (CFT, TEK, RBA). This work is supported by grants from the National Institute of General Medical Sciences (NIGMS), Human Genome Research Institute (NHGRI), National Library of Medicine (NLM), and the NIH/NIGMS Pharmacogenetics Research Network and Database (U01GM61374). The authors thank John Conroy for his expertise and assistance in accessing the PharmGKB database. Correspondence and reprints: Daniel L. Rubin, MD, MS, Section of Medical Informatics, MSOB X-215, Stanford, CA 94305; e-mail: . Received for publication: 06/16/04; accepted for publication: 10/20/04. 121 Journal of the American Medical Informatics Association Volume 12 Number 2 Mar / Apr 2005

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Information Technology: A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge

OBJECTIVE Biomedical databases summarize current scientific knowledge, but they generally require years of laborious curation effort to build, focusing on identifying pertinent literature and data in the voluminous biomedical literature. It is difficult to manually extract useful information embedded in the large volumes of literature, and automated intelligent text analysis tools are becoming ...

متن کامل

Exploring Reading Comprehension Needs of Yasouj EAP Students of Persian Literature

Abstract The main objective of the current English for Academic Purposes (EAP) programs in Iran is to fill the gap between the students’ general English competence and their ability to read discipline-specific texts. This study aims to investigate the target and present reading comprehension needs of EAP undergraduate students of Persian literature in Yasouj state university through a mixed met...

متن کامل

A review on precision medicine and conducting pharmacogenetics tests of drugs

Precision medicine, the selection of treatment based on the genetic characteristics of patients, is one of the new paradigms of medical science. Using genetic characteristics and biomarkers, patients' response to different treatments is evaluated and finally, a specific one is selected for them. In other words, using genetic information or biomarkers, the safety, effectiveness and outcomes of t...

متن کامل

Indexing pharmacogenetic knowledge on the World Wide Web.

A key challenge for pharmacogenetics is the creation of databases to store, analyse and disseminate important datasets in order to catalyse research and training. Most successful databases have a limited scope: Genbank contains DNA sequences [1]; the Protein Data Bank contains the three-dimensional coordinates of macromolecules [2]; the Online Mendelian Inheritance in Man contains a record of h...

متن کامل

I-33: Pharmacogenetics of Reproductive Medicine

Adverse drug reactions (ADRs) are a major problem in drug therapy and drug development. Inter-individual genetic differences can have significant roles in determining an individual’s susceptibility to ADRs. The rapid development of techniques in the area of genome analysis has put the scientific community in a power position and facilitated identification of new pharmacogenomic biomarkers that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005